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ABSTRACT 

The past few years has seen the rapid growth of data min- 
ing approaches for the analysis of data obtained from Mas- 
sive Open Online Courses (MOOCs). The objectives of this 
study are to develop approaches to predict the scores a stu- 
dent may achieve on a given grade-related assessment based 
on information, considered as prior performance or prior ac- 
tivity in the course. We develop a personalized linear mul- 
tiple regression (PLMR) model to predict the grade for a 
student, prior to attempting the assessment: activity. The 
developed model is real-time and tracks the participation of 
a student within a MOOC (via click-stream server logs) and 
predicts the performance of a student on the next assess- 
ment within the course offering. We perform a comprehen- 
sive set of experiments on data obtained from two openEdX. 
MOOCs via a Stanford University initiative. Our experi- 
mental results show the promise of the proposed approach 
in comparison to baseline approaches and also helps in iden- 
tification of key features that are associated with the study 
habits and learning behaviors of students. 
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1. INTRODUCTION 

Since their inception, Massive Open Online Courses (MOOCs) 
have aimed at delivering online learning on a wide variety 
of topics to a large number of participants across the world 
Due to the low cost (most times zero) and lack of entry bar- 
riers (e.g., prerequisites or skill requirements) for the par- 
ticipants, large number of students enroll in MOOCs but 
only a small fraction of them keep themselves engaged in 
the learning materials and participate in the various activi- 
ties associated with the course offering such as viewing the 
video lectures, studying the material, completing the various 
quizzes and homework-based assessments. 
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tiven, this high attrition rate and potential of MOOCs 
to deliver low-cost but high quality education, several re- 
searchers have analyzed the server logs associated with these 
MOOCs to determine the factors associated with students 
dropping out. Several predictive methods have been de- 
veloped to predict when a participant will drop out from 
a MOOC 4, 5, 6, 14]-Using self reported surveys, studies 
have determined the different motivations for students en- 
rolling and participating in a MOOC. Participants enroll in a 
MOOC sometimes to learn a subset of topics within the cur- 
riculum, sometimes to earn degree certificates for future ca- 
reer promotion or college credit, social experience or/and ex- 
ploration of free online education [8]. Students with similar 
motivation have different learning outcomes from a MOOC 
based on the number of invested hours, prior education back- 
ground, knowledge and skills [4]. 


In this paper, we present models to predict a student's fu- 
ture performance for a certain assessment activity witin a 
MOOC. Specifically, we develop an approach based on per- 
sonalized linear multi-regression (PLMR) to predict the per- 
formance of a student as they attempt various graded activ- 
ities (assessments) within the MOOC. This approach was 
previously studied within the context of predicting a stu- 
dent’s performance based on graded activities within a tradi- 
tional university course with data extracted from a learning 
management system (Moodle) (3]. The developed model is 
real-time and tracks the participation of a student within a 
MOOG (via click-stream server logs) and predicts the perfor- 
mance of a student on the next assessment within the course 
offering. Our approach also allows us to capture the varying 
studying patterns associated with different students, and re- 
sponsible for their performance. We evaluate our predictive 
model on two MOOCs offered using the OpenEdX platform 
and made available for learning analytics research via the 
Center for Advanced Research through Online Learning at 
Stanford University ' 


We extract features that seek to identify the learning behav- 
ior and study habits for different students. These features 
capture the various interactions that show engagement, ef- 
fort, learning and behavior for a given student participating 
; by viewing the various video and text-based 
available within the MOOC offering coupled with 
student attempts on graded and non-graded activities like 
quizzes and homeworks. Our experimental evaluation shows 
accurate grade prediction for different: types of homework as- 
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sessments in comparison to baseline models. Our approach 
also identifies the features found to be useful for predicting 
an accurate homework grade. 


2. RELATED WORK 

Several researchers have focused on the analysis of educa- 
tion data (including MOOCs), in an effort to understand 
the characteristics of student learning behaviors and moti- 
vation within this education model [11]. Brinton et. al. 
[1] developed an approach to predict if a student answers a 
question correct on the first attempt via click-stream infor- 
mation and social learning networks. Kennedy et. al. [7] 
analyzed the relationship between a student’s prior knowl 
edge on end-of-MOOC performance. Sunar et. al. [12] 
developed an approach to predict the possible interactions 
between peers participating in a MOOC. Elbadrawy et. al. 
[3] proposed the use of personalized linear multi-regression 
models to predict student: performance in a traditional uni- 
versity by extracting data from course management systems 
(Moodle). Our study focuses on MOOCs, which p 
different assumptions, challenges and features in comp: 
son to a traditional university environment. 


Most similar to our proposed work, Pardos et. al. pro- 
posed a model “Item Difficulty Effect Model” (IDEM) that 
incorporates the difficulty levels of different questions and 
modifies Bayesian Knowledge Tracing (BKT) model [2] by 
adding an “Item” node to every question node. By identi- 
fying the challenges associated with modeling MOOC data, 
the IDEM approach and extensions that involve splitting 
questions into several sub-parts and incorporating resource 
(knowledge) information [9] are considered state-of-the-art 
MOOC assessment prediction approaches and referred as 
KT-IDEM. However, this approach can only predict a bi- 
nary value grade. In contrast, the model proposed in this 
paper is able to predict both, a continuous and a binary 
grade. 


3. METHODS 


3.1 Personal Linear Multi-Regression Models 
We train a personalized linear multi-regression (PLMR) model 
[3] to predict student performance within a MOOC. Specifi- 
cally, the grade gs,a for a student s in an sment activity 
a is predicted as follows: 


Gis.a = Ds + PAW faa 


> ae a) 
=. +S (Pad D> foa,kWa,k)s 
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where by is bias term for student s, fsa is the feature vec- 
tor of an interaction between student s and activity a. The 
features extracted from the MOOC server logs are described 
in the next Section. np is the length of fea, indicating the 
dimension of our feature space. I is the number of linear 
regression models, W is the coefficient matrix of dimensions 
1x np that holds the coefficients of the { linear regression 
models, and ps is a vector of length / that holds the member- 
ships of student s within the / different regression models (3). 
Using lasso [13], we solve the following optimization prob- 
lem: 


imize L(W.P,B) +9(\\Plle+|Wlle) (2) 


where W, P and B denote the feature weights, student 
ips and bias terms, respectively. The loss func- 
tion L(-) is the least square loss for regression problems. 
V(\|Pllp + |W) is a regularizer that controls the model 
complexity by controling the values of feature weights and 
student memberships. Tuning the scalar 7 prevents model 
from over-fitting. 


3.2 Feature Description 

We extract features from MOOC server logs and formulate 
the PLMR model to predict real-time assessment grade for 
a given student. Figure 1 shows the various activities, gen- 
erally available within a MOOC. Fig 1 (a) shows that each 
homework has corresponding quizzes, each of which has its 
corresponding video as resources for learning. Fig 1 (b) 
shows that while watching a video, a student can have a 
series of actions. Fig 1 (c) shows that while studying using 
a MOOC, a student can have several login sessions. In order 
to capture the latent information behind the click-stream for 
each student, we extract six types of features: (i) ‘ion fea- 
tures, (ii) quiz related features, (iii) video related features, 
(iv) homework related features, (v) time related features and 
(vi) interval-based features. These features constitute the 
feature vector fsa for a student and a homework assessment. 
The description of these features are as follows: 


Ho 


fen ee] — i - el 


Quiet Quiet Quiet Quien 


Phy aiden] Passe video | Payaon | om 
‘Watching one vide 


Seok sien 


[No Activity Period 


‘Adjacent Login 


Figure 1: Different activities within a MOOC. 


(i) Session features:. 

A single study session is defined by a student login combined 
with the various available study interactions that a student 
may partake in. Since, students do not always log out of 
a session, we assume that a “no acti period of more 
than one hour constitutes a student logging out of a session. 
We show a “no activity” period for a student between two 
consecutive sessions in Fig 1 (c). 


« NumSession is the the average number of daily study 
sessions a student engages in, before a homework at- 
tempt. 


 AvgSessionLen is the average length of each session 
in minutes. We calculate the average study time of a 
study session by 

Total study time 
NumSession — 


AvgSessionLen = (3) 
¢ AvgNumLogin. Students are free to choose when to 
login and study in a MOOC environment. We consider 
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a day as a “work day” if a student logs into the study 
system; and a day as “rest day” if a student does not. 
The rate of “work” and “rest” can capture a student’s 
learning habits and engagement characteristics. 


AvgNumLogin = 
# of “work day” (4) 
lay” + # of “rest day”” 


(ii) Quiz Related features: 


¢ NumQuiz is the number of quizzes a student: takes 
before a homework attempt. This feature reflects the 
student’s dedication towards the course material and 
a factor towards performance in a homework. 


© AvgQuiz is the average number of attempts for each 
quiz. The MOOCs studied in this paper allow unlim- 
ited attempts on a quiz. 


(iii) Video Related features: 


VideoNum denotes the number of distinct video ses- 
sions for a student before a homework attempt. 


VideoNumPause is the average number of pause ac- 
tions per video. There are several actions associated 
with including “pause video”, “play 
video” and “load video”. Tracking these 
actions allows for capturing a student’s focus level and 
learning habits. 


e VideoViewTime is the total video viewing time. 


VideoPctWatch. In a large amount of cases, stu- 
dents do not finish watching a full video. As such, we 
calculate the average percentage of the watched part 
of a video. 


(iv) Homework Related features: 


¢ HWProblemSave is the average number of “save an- 
swer” actions for each homework assessment. Before 
submitting answers for a homework, students are al- 
lowed to save their answer sheet and check as many 
times as they need. This feature is more valuable when 
the MOOC provides only one chance for a homework 
answer submission. 


(v) Time Related features: 


© TimeHwQuiz is the time between a homework an- 
swer submission and the last quiz attempt. 


¢ TimeHwVideo is the time between a homework an- 
swer submission and the last video watching activity. 


 TimePlayVideo is the percentage of study sessions 
with video watching activity over all the study sessions. 


« HwSessions is the number of sessions that. have home- 
work related activities (save and submit). 


(vi) Interval-Based features:. 
It is expected that there will be some changes in study activ- 
ities once the students know the former homework’s grade. 
They may study harder if they don’t get a satisfactory score. 
The interval-based features are aiming to represent different 
activities between two consecutive homeworks. 


IntervalNumQuiz: denotes the number of quizzes 
the student takes between two homeworks. 


IntervalQuizAttempt: is the average number of quiz 
attempts between two homeworks. 


IntervalVideo: is the number of videos a student 
watches between two homeworks. 


IntervalDailySession: is the average number of ses- 
sions per day between two homeworks. 


IntervalLogin: is the percentage of login days be- 
tween two homeworks. 


We also use the cumulative grade (so-far) on quizzes and 
homeworks for a student as a feature and denote it by 
Meanscore. For our baseline approach we only consider 
the averages computed on the previous homeworks. 


Figure 2: Distribution of students attempting each 
Assessment. stMed and Sttearn had 6 and 9 assessments, respec- 


tively. 


4. EXPERIMENTS 

4.1 Datasets 

We evaluated our methods on two MOOCs: “Statisties in 
Medicine” (represented as StMed in this paper) taught in 
Summer 2014 and “Stati 
in this paper) taught in Winter 2015. 


StMed: This dataset includes server logs tracking infor- 
mation about a student viewing video lectures, checking 
text /web articles, attempting quizzes and homeworks (which 
are graded). Specifically, this MOOC contains 9 learning 
units with 111 assessments, including 79 quizzes, 6 home- 
works and 26 single questions. The course had 13,130 stu- 
dents enrolled, among which 4337 students submitted at 
least one assignment (quiz or homework) and had corre- 
sponding scores, 1262 students have completed part of the 
six homeworks and 1099 students have attempted all the 
homeworks. 193 students attempted all the 79 quizzes and 
six homeworks. This course had 131 videos and 6481 stu- 
dents had video related activity. 
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ical Learning” (represented as StLearn 


of se 


Figure 3: AllStMed Prediction Results. RMSE (| is 
better). 


Figure 4: AllStLearn Prediction Results. Accuracy 
(+ is better). 


StLearn: This course had ten units. Except the first one, 
all units have quizzes and end of unit homeworks, which 
add up to 103 assessments in total. 52,821 students en- 
rolled in this course, and 4987 students had assessment ac- 
tivities, 3509 students attempted a subsets of the available 
homeworks while 346 students attempted all the 9 home- 
works, and 118 students attempted all the 103 assessments. 
The key difference between the homeworks in the StLearn 
in comparison to the StMed is that homeworks have only 
one question which a student can either get correct or in- 
correct. As such, scoring in this MOOC is binary instead of 
continuous. To predict whether a student answers a ques- 
tion correctly, we reformulate the regression problem as a 
classification problem using a logistic loss function. Figure 
2 shows the distribution of students attempting the different 
assessments available across the two MOOCs studied here. 


4.2 Experimental Protocol 

In order to gain a deep insight of students’ performance in 
a MOOC, we perform two types of experiments. Given n, 
homework assessments represented as {H1,..., Hn} our ob- 
jective is to predict the score a student achieves in each 
of the n homeworks. Depicting the most realistic setting, 
for the ith homework, H; we define the training set as all 
homework and student pairs who attempt and have a score 
for all homeworks up to the Hj-1. For predicting the score 
for H; for a given student, we use all the features extracted 
just before attempting the target homework H;. We refer to 
this as Previous HW-based Prediction. Secondly, for the 
predicting i-th homework H;’s score, we use training data 
of student-homework pairs restricted from only the previ- 
ous one homework i.e., Hi-1. This experiment is referred 
by PreviousOneHW-based Prediction. Note, in these 
cases we cannot make any prediction for the first homework 
(H;) since, we do not have any training information for a 


given student. 


4.3 Data Partition 

We partition the students for StLearn and StMed into two 
groups: the group of students who attempt all the requested 
homeworks, and the group of students who finish few of the 
homeworks. This allows us to consider the different moti- 
vations and expectations of students enrolling in a MOOC. 
For example, the students who aim to learn ina MOOC may 
choose watching videos over taking all homeworks. While, 
the students who want to achieve a degree certificate may 
focus on the homework completeness. We refer to the first 
group by “Partial homeworks accomplished group”, and the 
second group by “All homeworks accomplished group”. We 
evaluate our models on the two groups for the AllStMed 
and AllStLearn datasets. Specifically, we name the four 
group of students as AIStMed, AllStLearn, PartialStMed 
and PartialStLearn based on their group and MOOC class. 


HW# | PLMR | Meanscore 
2 0.230 0.248 
3 0.162 0.176 
4 0.176 0.196 
5 0.144 0.156 
6 0.143 0.150 
Avg 0.171 0.185 


Table 1: PreviousHW-based RMSE Performance 
(RMSE) comparison for AllStMed. 
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Figure 5: Predictive Performance with Removal of 
Feature Types. 


4.4 Evaluation Metrics 

StMed course has continuous scores for a homework, which 
are scaled between 0 and 1. However, the homework score is 
binary in the StLearn course, indicating whether the student 
answers a question correctly or incorrectly. For StLearn, 
we use a logistic loss and formulate a classification problem 
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Accuracy (1) FLO) 
HW# Baseline Baseline 
PLM Meanscore [ KT-IDEM FEMS Meanscore | KT-IDEM 
2 0.641 0.646 0.623 0.775 0.777 0.768 
3 0.760. 0.580 0.681 0.821 0.805 0.810 
4 0.754 0.710 0.739 0.838 0.706 0.850 
5 0.867 0.809 0.829 0.920 0.880 0.906 
6 0.730 0.678 0.667, 0.808 0.776 0.800 
Kg 0.716 0.675 0.730 0.887 0.878 0.844 
8 0.817 0.762 0.817 0.903 0.849 0.886 
9 0.823 0.794 0.777 0.864 0.856 0.853 
Avg 0.764 0.707 0.759 0.852 0.816 0.848 


Table 2: PreviousHW-based prediction performance comparison for AllStLearn group. 


instead of the regression problem as done for the StMed 
course. To evaluate the performance of our approach, we 
use the root mean squared error (RMSE) as the metric of 
choice for regression problem. For classification problem, we 
use accuracy and the Fl-score (harmonic mean of precision 
and recall), known to be a suitable metric for imbalanced 
datasets. 


4.5 Comparative Approaches. 
In this work, we compare the performance of our proposed 
methods with two different competitive baseline approaches. 


(i) Average grade of the previous homeworks. We 
calculate the mean score of a given student’s previous home- 
works to predict their future performance and is denoted as 
Meanscore. We use this method to compare our prediction 
results on StMed. 


(ii) KT-IDEM [10]. KT-IDEM is a modified version of 
original BKT model. By adding an “item” node to every 
question node, the model is able to identify different dif 
ficulty levels of each question. Since this model can only 
predict a binary value grade, we use this model to compare 
our prediction results on StLearn. 


5. RESULTS AND DISCUSSION 


5.1 Assessment Prediction Results 

Figures 3 and 4 show the prediction results with varying 
number of regression models for the AllStMed and AllStLearn 
MOOCs, respectively. Analyzing Figure 3 we observe that as 
the number of regression models increases the RMSE metric 
goes lower and use of five models seems to be good choice for 
all the different homeworks. Comparing the PreviousHW- 
and PreviousOneHW-based results, we notice that predic- 
tions for all the homeworks (HW3, HW4, HW5, and HW6) 
benefits from using all the available training data prior to 
those homeworks i.e., to predict grade for H; it is better to 
use training information extracted from Hy... Hj—1 rather 
than just Hi-1. Similar observations can be made while 
analyzing the prediction results for the AllStLearn cohort 
ch includes nine homework correct /incorrect binary as- 
sessments. Figure 4 shows the accuracy scores (higher is 
better) for the three experiments. For the PreviousOneHW- 
and PreviousHW-based experiments HW5 shows the best 
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prediction results. This suggests that in the middle of a 
MOOG, students tend to have stable study activities and the 
performance is more predictable than other phases. Also, 
some homeworks thrive well with just using training data 
from the previous homework (PreviousOneHW-based, e.g, 
HW). 


5.1.1 Comparative Performance 

Table 1 shows the comparison between baseline approach 
(Meanscore) and the predictive model for the PreviousHW- 
based experiments for the AllStMed group. We cannot re- 
port results for the KT-IDEM model since, it solves the 
binary classification problem only. Table 2 shows the com- 
parison of the accuracy and F1 scores of the AllStLearn 
groups with baseline approaches. We notice that for pre- 
dicting the second homework, which only uses the informa- 
tion from HW1, the predictive model is not as good as the 
mean baseline, which reflects that under the situation of 
lack of necessary amount of information, linear regression 
models cannot always outperform the baseline. But as the 
dataset gets larger, our approach outperforms the baseline 
due to the availability of more training data. From Table 
2, we also notice for some homework, KT-IDEM has better 
performance than PLMR (HW7 and HW4). This could be 
due to unstable academic activities during these two study 
periods, which can effect the performance of PLMR. 


5.1.2. Feature Importance 

We test the effect of each feature set in predicting the as- 
sessment scores by training the models under the absence of 
each feature group. For the StLearn course, since there is 
no limit on homework attempts, we do not add Interval- 
Based feature groups to the predictive model. Figure 5 
shows the comparison of each prediction result for AllStMed, 
PartialStMed, AllStLearn and PartialStLearn cohorts. Ana- 
lyzing these results we observe that for the StLearn MOOC, 
meanscore is a significant feature and removing it leads to 
a substantial decrease in accuracy for both All and Partial- 
cohorts. For the AllStMed, the removal of video related 
features leads to the most decrease in performance (i.e., in- 
creased RMSE). This suggests that features related to the 
video watching are crucial for predicting the final homework 
scores. For the PartialStMed, the use of all feature types or 
a subset does not show a clear winner. This could be due to 
the varying characteristics of students within these group. 
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Another way to analyze feature importance is to exclude the 
influence of the dominant feature, which is meanscore in our 
study. The evaluation formula of the importance of the i,), 
feature (excluding meanscore feature) is as follows: 


N a 
1 =1 [Pns.dfng iWa,il 2 
n=t> daa sddns (5) 


fat Dike [Psa ees fans ewael” 


where N is number of test samples, ns is the student num- 
ber corresponding to the nin test sample. fng,i is the fea- 
ture value of an interaction between student ngs and activity 
i. np is the number of features. / is the number of lin- 
ear regression models. wa,i is the coefficient of din linear 
regression model with i+, feature, and pns.a is the mem- 
bership of student ns with the din regression model. We 
calculate each feature’s importance by calculating the per- 
centage contribution of each feature to the overall grade 
prediction. Figure 6 shows the feature importance on the 
AllStMed group, excluding Meanscore feature. We can see 
NumQuiz and VideoPctWatch are the most important 
for AllStMed group besides Meanscore feature. 
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Figure 6: Feature importance for AllStMed. 
6. CONCLUSION AND FUTURE WORK 


In this work we formulated a personalized multiple linear re- 
gression model to predict: the homework grades for a student 
enrolled and participating within a MOOC. Our contribu- 
tions include engineering features that capture a student’s 
studying behavior and learning habits, derived solely from 
the server logs of MOOCs. We evaluated our framework 
on two OpenEdX MOOC courses provided by an initiative 
at Stanford University. Our experimental evaluation shows 
improved performance in terms of prediction of real time 
homework scores compared to baseline methods. We also 
studied on different groups of student participants due to 
their motivation. Features associated with engagement (log- 
ging multiple times), studying materials (viewing videos and 
attempting quizzes) were found to be important along with 
prior homework scores for this prediction problem. 
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